In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Chapter 1 - Introduction to Machine Learning

This chapter introduces some common concepts about learning (such as supervised and unsupervised learning) and some simples applications.

Supervised Learning

  • Classification (labels)
  • Regression (real)

Learn when we have a dataset with points and true responses variables. If we use a probabilistic approach to this kind of inference, we want to find the probability distribution of the response $y$ given the training dataset $\mathcal{D}$ and a new point $x$ outside of it.

$$p(y\ |\ x, \mathcal{D})$$

A good guess $\hat{y}$ for $y$ is the Maximum a Posteriori estimator:

$$ŷ = \underset{c}{\mathrm{argmax}}\ p(y = c|x, \mathcal{D})$$

Unsupervised Learning

  • Clustering
  • Dimensionality Reduction / Latent variables
  • Discovering graph structure
  • Matrix completions

Parametric models

These models have a finite (and fixed) number of parameters. Examples:

  • Linear regression: $$y(\mathbf{x}) = \mathbf{w}^\intercal\mathbf{x} + \epsilon$$

    Which can be written as

$$p(y\ |\ x, \theta) = \mathcal{N}(y\ |\ \mu(x), \sigma^2) = \mathcal{N}(y\ |\ w^\intercal x, \sigma^2)$$

In [2]:
%run ../src/LinearRegression.py
%run ../src/PolynomialFeatures.py

# LINEAR REGRESSION

# Generate random data
X = np.linspace(0,20,10)[:,np.newaxis]
y = 0.1*(X**2) + np.random.normal(0,2,10)[:,np.newaxis] + 20

# Fit model to data
lr = LinearRegression()
lr.fit(X,y)

# Predict new data
x_test = np.array([0,20])[:,np.newaxis]
y_predict = lr.predict(x_test)


# POLYNOMIAL REGRESSION

# Fit model to data
poly = PolynomialFeatures(2)
lr = LinearRegression()
lr.fit(poly.fit_transform(X),y)

# Predict new data
x_pol = np.linspace(0, 20, 100)[:, np.newaxis]
y_pol = lr.predict(poly.fit_transform(x_pol))

In [3]:
# Plot data

fig = plt.figure(figsize=(14, 6))

# Plot linear regression
ax1 = fig.add_subplot(1, 2, 1)
plt.scatter(X,y)
plt.plot(x_test, y_predict, "r")
plt.xlim(0, 20)
plt.ylim(0, 50)

# Plot polynomial regression
ax2 = fig.add_subplot(1, 2, 2)
plt.scatter(X,y)
plt.plot(x_pol, y_pol, "r")
plt.xlim(0, 20)
plt.ylim(0, 50);


  • Logistic regression: Despite the name, this is a classification model
$$p(y\ |\ x, w) = \mathrm{Ber}(y\ |\ \mu(x)) = \mathrm{Ber}(y\ |\ \mathrm{sigm}(w^\intercal x))$$
where

$$\displaystyle \mathrm{sigm}(x) = \frac{e^x}{1+e^x}$$

In [5]:
%run ../src/LogisticRegression.py

X = np.hstack((np.random.normal(90, 2, 100), np.random.normal(110, 2, 100)))[:, np.newaxis]
y = np.array([0]*100 + [1]*100)[:, np.newaxis]

logr = LogisticRegression(learnrate=0.002, eps = 0.001)

logr.fit(X, y)

x_test = np.array([-logr.w[0]/logr.w[1]]).reshape(1,1) #np.linspace(-10, 10, 30)[:, np.newaxis]
y_probs = logr.predict_proba(x_test)[:, 0:1]
print("Probability:" + str(y_probs))


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-f3e8c3854849> in <module>()
      8 logr.fit(X, y)
      9 
---> 10 x_test = np.array([-logr.w[0]/logr.w[1]]).reshape(1,1) #np.linspace(-10, 10, 30)[:, np.newaxis]
     11 y_probs = logr.predict_proba(x_test)[:, 0:1]
     12 print("Probability:" + str(y_probs))

ValueError: total size of new array must be unchanged

In [37]:
# Plot data

fig = plt.figure(figsize=(14, 6))

# Plot sigmoid function
ax1 = fig.add_subplot(1, 2, 1)
t = np.linspace(-15,15,100)
plt.plot(t, logr._sigmoid(t))

# Plot logistic regression
ax2 = fig.add_subplot(1, 2, 2)
plt.scatter(X, y)
plt.scatter(x_test, y_probs, c='r')


Out[37]:
<matplotlib.collections.PathCollection at 0x7f8fdf0434e0>

Non-parametric models

These models don't have a finite number of parameters. For example the number of parameters increase with the amount of training data, as in KNN:

$$p(y=c\ |\ x, \mathcal{D}, K) = \frac{1}{K} \sum_{i \in N_K(x, \mathcal{D})} \mathbb{I}(y_i = c)$$

In [8]:
%run ../src/KNearestNeighbors.py

# Generate data from 3 gaussians
gaussian_1 = np.random.multivariate_normal(np.array([1, 0.0]), np.eye(2)*0.01, size=100)
gaussian_2 = np.random.multivariate_normal(np.array([0.0, 1.0]), np.eye(2)*0.01, size=100)
gaussian_3 = np.random.multivariate_normal(np.array([0.1, 0.1]), np.eye(2)*0.001, size=100)
X = np.vstack((gaussian_1, gaussian_2, gaussian_3))
y = np.array([1]*100 + [2]*100 + [3]*100)

# Fit the model
knn = KNearestNeighbors(5)
knn.fit(X, y)

# Predict various points in space
XX, YY = np.mgrid[-5:5:.2, -5:5:.2]
X_test = np.hstack((XX.ravel()[:, np.newaxis], YY.ravel()[:, np.newaxis]))
y_test = knn.predict(X_test)

In [9]:
fig = plt.figure(figsize=(14, 6))

# Plot original data
ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(X[y == 1,0], X[y == 1,1], 'bo')
ax1.plot(X[y == 2,0], X[y == 2,1], 'go')
ax1.plot(X[y == 3,0], X[y == 3,1], 'ro')

# Plot predicted data
ax2 = fig.add_subplot(1, 2, 2)
ax2.contourf(XX, YY, y_test.reshape(50,50));


Curse of dimensionality

The curse of dimensionality refers to a series of problems that arise only when dealing with high dimensional data sets. For example, in the KNN model, if we assume the data is uniformly distributed over a $N$-dimensional cube (with high $N$), then most of the points are near its faces. Therefore, KNN loses its locality property.


In [ ]: